Coffee Drinking Habits and Working Lifestyle

INFO 526 - Summer 2024 - Final Project

A data-driven analysis on how much coffee are people drinking, and what lifestyle conditions correlate.
Author
Affiliation

Vera Jackson

School of Information, University of Arizona

Abstract

Add project abstract here.

Introduction

Coffee is one of the most popular drinks in the world, and in the USA alone, 75% of the population has reported drinking coffee, with almost half of Americans drinking coffee daily (Loftfield, et al. 2016). There are many reasons why people drink coffee - for some it is a habit or part of their morning routine, some need the caffeine, and some just like the way it tastes. Studies have shown that demographic factors, such as gender and race, influence how much coffee someone drinks (Loftfield, et al. 2016). However, one’s lifestyle and environment may also have a part to play in this.

The “Great American Coffee Taste Test” data set come’s from TidyTuesday’s 2024 series is compiled of survey results that were filled out by participants of a taste test, hosted by “world champion barista” James Hoffman and coffee company Cometeer. Cometeer sent 4 unlabelled coffees to over 4,000 customers that would participate in a live taste testing on YouTube while filling out the survey. The survey includes questions about coffee drinking habits, coffee preferences, individual taste test results for each of the 4 provides coffees, and individual demographics.

Question: What are the correlations between coffee consumption and lifestyle?

Introduction

The data has a wide range of questions, with 4,042 responses. Rather than looking at demographics such as gender and race, I was curious about how someone’s lifestyle, primarily focused on one’s working habits, impacts how much coffee they drink. For the purpose of this question, we will only be looking at the following variables, with the following options to answer in the survey:

  • cups: “How many cups of coffee do you typically drink per day?”

    • Less than 1”, “1”, “2”, “3”, “4”, “More than 4
  • employment_status: “Employment Status”

    • Retired”, “Employed full-time”, “Employed part-time”, “Homemaker”, “Student”, “Unemployed
  • number_children: “Number of Children”

    • None”, “1”, “2”, “3”, “More than 3
  • wfh: “Do you work from home or in person?”

    • "I primarily work in person", "I primarily work from home", "I do a mix of both"
  • age: “What is your age?”

    • "<18 years old", "18-24 years old", "25-34 years old", "35-44 years old", "45-54 years old", "55-64 years old", ">65 years old"

Removing any responses that did not respond to each of these questions, we are working with 3, 343 responses.

I chose to analyze this question for this data set because I was curious about what may influence coffee drinkers in their consumption habits. In particular, their working conditions. Whether someone is retired or a student, or working from home or at the office, may form someone’s environment and influence their habits. In addition to working and work-from-home status, I also included how many children they have - especially for homemakers, but raising children also is considered a form of labor. Finally, I included age, as this could be another explanatory variable for how much coffee someone drinks.

Overall, my goal for this question is to highlight a trend between coffee drinking and lifestyle that could be reflective of the general American coffee-drinking population.

Approach

With the cups variable, a pie chart is made to demonstrate the distribution of responses for the entire, cleaned data set. The pie chart is color-coded based on the response to how many cups of coffee one drinks per day, with the percentages of those responses that make up the data set.

Once an overall average is observed, a point plot with error bars ranging from the 10th to 90th percentile was constructed, faceted by the responses to the explanatory variables (employment_status, number_children, wfh, age). To obtain the mean and percentiles to be plotted, some calculations had to be conducted:

The point plot was to best to display the similarity and differences between averages for each group. The percentiles also suggested that certain groups may be more likely to lean either way outside of the average, so to further analyze these, one final plot was constructed.

A diverging bar chart, grouped by explanatory variable and filled for the six possible answers to the “cups” variable, was then constructed. Only the answers that were outside of the average were plotted so we could focus on which group is most likely to drink less or more than the average.

Analysis

Discussion

References